5 research outputs found

    Reducing cache hierarchy energy consumption by predicting forwarding and disabling associative sets

    Get PDF
    The first level data cache in modern processors has become a major consumer of energy due to its increasing size and high frequency access rate. In order to reduce this high energy consumption, we propose in this paper a straightforward filtering technique based on a highly accurate forwarding predictor. Specifically, a simple structure predicts whether a load instruction will obtain its corresponding data via forwarding from the load-store structure - thus avoiding the data cache access - or if it will be provided by the data cache. This mechanism manages to reduce the data cache energy consumption by an average of 21.5% with a negligible performance penalty of less than 0.1%. Furthermore, in this paper we focus on the cache static energy consumption too by disabling a portion of sets of the L2 associative cache. Overall, when merging both proposals, the combined L1 and L2 total energy consumption is reduced by an average of 29.2% with a performance penalty of just 0.25%

    Wavelet transform for large scale image processing on modern microprocessors

    Get PDF
    In this paper we discuss several issues relevant to the vectorization of a 2-D Discrete Wavelet Transform on current microprocessors. Our research is based on previous studies about the efficient exploitation of the memory hierarchy, due to its tremendous impact on performance. We have extended this work with a more detailed analysis based on hardware performance counters and a study of vectorization, in particular, we have used the Intel Pentium SSE instruction set. Most of our optimizations are performed at source code level to allow automatic vectorization, though some compiler intrinsic functions have been introduced to enhance performance. Taking into account the abstraction at which the optimizations are performed, the results obtained on an Intel Pentium III microprocessor are quite satisfactory, even though further improvement can be obtained by a more extensive use of compiler intrinsics

    2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation

    Get PDF
    This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code)

    Funcionamiento de la herramienta OpenIRS-UCM y sus sinergias con Moodle

    Get PDF
    Los sistemas de respuesta interactiva han ido ganando aceptación dentro de la comunidad educativa en los últimos años y una prueba clara de ello es el número creciente de los sistemas comerciales disponibles hoy en el mercado. Sin embargo, la mayoría de las soluciones se basan en sistemas que están cerrados, son rígidos y dependientes del software instalado en el computador del profesor. Presentamos en este trabajo una nueva herramienta gratuita que hemos denominado OpenIRS-UCM que incorpora la mayoría de las funcionalidades de las aplicaciones comerciales con la ventaja de integrar varios tipos de mandos comerciales con otros dispositivos como smartphones, PDAs, portátiles, etc. Además, permite interactuar con la plataforma del campus virtual de Moodle incrementando exponencialmente sus posibilidades de uso
    corecore